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Twelve complete genomes of three novel coronaviruses—bat coronavirus HKU4 (bat-CoV HKU4), bat-CoV 
HKU5 (putative group 2c), and bat-CoV HKU9 (putative group 2d)—were sequenced. Comparative genome 
analysis showed that the various open reading frames (ORFs) of the genomes of the three coronaviruses had 
significantly higher amino acid identities to those of other group 2 coronaviruses than group 1 and 3 
coronaviruses. Phylogenetic trees constructed using chymotrypsin-like protease, RNA-dependent RNA poly¬ 
merase, helicase, spike, and nucleocapsid all showed that the group 2a and 2b and putative group 2c and 2d 
coronaviruses are more closely related to each other than to group 1 and 3 coronaviruses. Unique genomic 
features distinguishing between these four subgroups, including the number of papain-like proteases, the 
presence or absence of hemagglutinin esterase, small ORFs between the membrane and nucleocapsid genes 
and ORFs (NS7a and NS7b), bulged stem-loop and pseudoknot structures downstream of the nucleocapsid 
gene, transcription regulatory sequence, and ribosomal recognition signal for the envelope gene, were also 
observed. This is the first time that NS7a and NS7b downstream of the nucleocapsid gene has been found in 
a group 2 coronavirus. The high Ka/Ks ratio of NS7a and NS7b in bat-CoV HKU9 implies that these two group 
2d-specific genes are under high selective pressure and hence are rapidly evolving. The four subgroups of group 
2 coronaviruses probably originated from a common ancestor. Further molecular epidemiological studies on 
coronaviruses in the bats of other countries, as well as in other animals, and complete genome sequencing will 
shed more light on coronavirus diversity and their evolutionary histories. 


Coronaviruses are found in a wide variety of animals and can 
cause respiratory, enteric, hepatic, and neurological diseases of 
varying severity. Based on genotypic and serological character¬ 
ization, coronaviruses were divided into three distinct groups 
(3, 12, 36). As a result of the unique mechanism of viral 
replication, coronaviruses have a high frequency of recombi¬ 
nation (12). Their tendency for recombination and high muta¬ 
tion rates may allow them to adapt to new hosts and ecological 
niches (8, 33). 

The recent severe acute respiratory syndrome (SARS) epi¬ 
demic, the discovery of SARS coronavirus (SARS-CoV), and 
identification of SARS-CoV-like viruses from Himalayan palm 
civets and a raccoon dog from wild live markets in China have 
boosted interest in the discovery of novel coronaviruses in both 
humans and animals (6, 17, 19, 21, 31). In 2004, a novel group 
1 human coronavirus, human coronavirus NL63 (HCoV- 
NL63), was reported independently by two groups (5, 27). In 
2005, we described the discovery, complete genome sequence, 
clinical features, and molecular epidemiology of another novel 
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group 2 human coronavirus, coronavirus HKU1 (CoV-HKUl) 
(14, 29, 32). Recently, we have also described the discovery of 
SARS-CoV-like virus in Chinese horseshoe bats and a novel 
group 1 coronavirus in large bent-winged bats, lesser bent¬ 
winged bats, and Japanese long-winged bats in Hong Kong (13, 
20). SARS-CoV-like viruses have also been identified in horse¬ 
shoe bats in other provinces of China (15). Based on these 
findings, a territory-wide molecular surveillance study was con¬ 
ducted to examine the diversity of coronaviruses in bats of our 
locality, and in this search six novel coronavirus species were 
discovered (30). From phylogenetic analysis of the RNA-de¬ 
pendent RNA polymerase (pol ) and helicase genes, two of the 
viruses, bat coronavirus HKU4 (bat-CoV HKU4) and bat coro¬ 
navirus HKU5 (bat-CoV HKU5), seemed to form a distinct 
subgroup in group 2 coronavirus. 

In the present study, we extended our survey to include 
specimens of bats in the Guangdong province of Southern 
China where the SARS epidemic originated and wet-markets 
and game food restaurants serving bat dishes are commonly 
found (34). Five different coronaviruses were identified, in¬ 
cluding two previously undescribed coronavirus species: bat 
coronavirus HKU9 (bat-CoV HKU9) and bat coronavirus 
HKU10 (bat-CoV HKU10). In addition, we sequenced four 
complete genomes each of the two putative group 2c corona¬ 
viruses (bat-CoV HKU4 and bat-CoV HKU5) we discovered 
in Hong Kong (30) and the putative group 2d coronavirus 
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TABLE 1. Bat species captured and associated coronaviruses in the present surveillance study 



Bat 




Scientific name 

Common name 

No. of bats 
tested 

No. (%) of bats positive 
for coronaviruses 

Coronavirus(es) {n) a 

Hipposideros lan’atus 

Intermediate roundleaf bat 

2 

0(0) 


Hipposideros armiger 

Great roundleaf bat 

26 

0 (0) 


Hipposideros pomona 

Pomona roundleaf bat 

1 

0 (0) 


Minioptems magnater 

Greater bent-winged bat 

14 

0 (0) 


Minioptems pusillus 

Lesser bent-winged bat 

13 

2(15) 

Bat-CoV HKU8 

Myotis ricketti 

Rickett’s big-footed bat 

1 

0 (0) 


Rhinolophus osgoodi 

Osgood’s horseshoe bat 

1 

0 (0) 


Rhinolophus pusillus 

Least horseshoe bat 

12 

0 (0) 


Rhinolophus affinus 

Intermediate horseshoe bat 

25 

0 (0) 


Rhinolophus sinicus 

Chinese horseshoe bat 

64 

7(11) 

Bat-CoV HKU2 (6), Bat-SARS-CoV 
HKU3 (1) 

Rousettus lechenaulti 

Leschenault’s rousette 

350 

43 (12%) 

Bat-CoV HKU9 (42), Bat-CoV 
HKU10 (1) 


“ n. number of bats positive for indicated virus. 


(bat-CoV HKU9) discovered in the present study and com¬ 
pared the 12 genomes with those of other coronaviruses. Based 
on the results of the present study, we propose two novel 
subgroups, group 2c and group 2d, among group 2 coronavi¬ 
ruses. 

MATERIALS AND METHODS 

Sample collection. A total of 509 bats (11 different species) were captured 
from various locations in the Guangdong province of Southern China over a 
7-month period (October 2005 to April 2006). Respiratory and alimentary spec¬ 
imens were collected by procedures described previously (13, 35). 

RNA extraction. Viral RNA was extracted from the respiratory and alimentary 
specimens by using QIAamp viral RNA minikit (QIAGEN, Hilden, Germany). 
The RNA was eluted in 50 |xl of AVE buffer and was used as the template for 
reverse transcription-PCR (RT-PCR). 

RT-PCR of pol gene of coronaviruses using conserved primers and DNA 
sequencing. Coronavirus screening was performed by amplifying a 440-bp frag¬ 
ment of the pol gene of coronaviruses using the conserved primers (5'-GGTTG 
GGACTATCCTAAGTGTGA-3' and 5' -CCATCATCAGATAGAATCATCA 
TA-3') designed by multiple alignments of the nucleotide sequences of available 
pol genes of known coronaviruses (29). RT was performed by using a Superscript 
III kit (Invitrogen, San Diego, CA). The PCR mixture (25 |xl) contained cDNA, 
PCR buffer (10 mM Tris-HCl [pH 8.3], 50 mM KC1, 3 mM MgCl 2 , and 0.01% 
gelatin), 200 |iM concentrations of each deoxynucleoside triphosphate, and 1.0 
U of Taq polymerase (Applied Biosystems, Foster City, CA). The mixtures were 
amplified in 60 cycles of 94°C for 1 min, 48°C for 1 min, and 72°C for 1 min and 
a final extension at 72°C for 10 min in an automated thermal cycler (Applied 
Biosystems). Standard precautions were taken to avoid PCR contamination, and 
no false-positive was observed in negative controls. 

The PCR products were gel purified by using a QIAquick gel extraction kit 
(QIAGEN). Both strands of the PCR products were sequenced twice with an 
ABI Prism 3700 DNA analyzer (Applied Biosystems) using the two PCR prim¬ 
ers. The sequences of the PCR products were compared to known sequences of 
the pol genes of coronaviruses in the GenBank database. 

Viral culture. Two of the samples positive for bat-CoV HKU9 and the sample 
positive for bat-CoV HKU10 were cultured in LLC-Mk2 (rhesus monkey kid¬ 
ney), MRC-5 (human lung fibroblast), FRhK-4 (rhesus monkey kidney), Huh-7.5 
(human hepatoma), Vero E6 (African green monkey kidney), and HRT-18 
(colorectal adenocarcinoma) cells. 

Complete genome sequencing. Twelve complete genomes of bat-CoV HKU4 
(30), bat-CoV HKU5 (30), and the novel bat coronavirus discovered in the 
present study (bat-CoV HKU9) were amplified and sequenced using the RNA 
extracted from the alimentary specimens as templates. The RNA was converted 
to cDNA by a combined random-priming and oligo(dT) priming strategy. Since 
the initial results revealed that these coronaviruses were group 2 coronaviruses, 
the cDNA was amplified by degenerate primers designed by multiple alignment 
of the genomes of CoV-HKUl (GenBank accession no. NC_006577), murine 
hepatitis virus (GenBank accession no. NC_006852), human coronavirus OC43 


(GenBank accession no. NC_005147), bovine coronavirus (GenBank accession 
no. NC_003045), rat sialodacryoadenitis coronavirus (GenBank accession no. 
AF207551), equine coronavirus NC99 (GenBank accession no. AY316300), por¬ 
cine hemagglutinating encephalomyelitis virus (GenBank accession no. 
NC_007732), SARS-CoV (GenBank accession no. NC_004718), and bat-SARS- 
CoV HKU3 (GenBank accession no. DQ022305) and additional primers de¬ 
signed from the results of the first and subsequent rounds of sequencing. These 
primer sequences are available on request. The 5' ends of the viral genomes were 
confirmed by rapid amplification of cDNA ends using a 573' RACE kit (Roche, 
Germany). Sequences were assembled and manually edited to produce final 
sequences of the viral genomes. 

Genome analysis. The nucleotide sequences of the genomes and the deduced 
amino acid sequences of the open reading frames (ORFs) were compared to 
those of other coronaviruses. Phylogenetic tree construction was performed by 
using the neighbor-joining method with CLUSTAL X 1.83. Protein family anal¬ 
ysis was performed by using PFAM and InterProScan (1, 2). Prediction of 
transmembrane domains was performed by using TMpred and TMHMM (9, 23). 

Estimation of synonymous and nonsynonymous substitution rates. The num¬ 
ber of synonymous substitutions per synonymous site (Ks) and the number of 
nonsynonymous substitutions per nonsynonymous site (Ka) for each coding 
region between each pair of strains were calculated by using the Nei-Gojobori 
method (Jukes-Cantor) in MEGA 3.1 (11). Since the sequences of three of the 
four genomes of bat-CoV HKU4 are almost identical and the sequences of three 
of the four genomes of bat-CoV HKU5 are almost identical, the Ka/Ks ratios for 
the coding regions in bat-CoV HKU4 and bat-CoV HKU5 were each calculated 
using one of these three genomes and the remaining genome that possessed 
more differences. For the four strains of bat-CoV HKU9, six pairwise compar¬ 
isons were performed for each coding region. 

Nucleotide sequence accession numbers. The nucleotide sequences of the 12 
genomes of bat-CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9 have been 
submitted to the GenBank sequence database under accession numbers 
EF065505 to EF065516. 

RESULTS 

Bat surveillance and identification of two novel coronavi¬ 
ruses. A total of 1,018 respiratory and alimentary specimens 
from 509 bats of 11 different species were obtained in the 
Guangdong province in Southern China (Table 1). RT-PCR 
analyses for a 440-bp fragment in the pol genes of coronavi¬ 
ruses were positive in alimentary specimens from 52 (10.2%) 
and in a respiratory specimen from 1 (0.2%) of 509 bats. 
Sequencing results suggested the presence of five different 
coronaviruses (Table 1 and Fig. 1). The sequences of two 
samples from lesser bent-winged bat ( Minioptemspusillus) pos¬ 
sessed >97% nucleotide identities to a group 1 coronavirus 
(bat-CoV HKU8) that we described recently from lesser bent- 
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FIG. 1. Phylogenetic analysis of amino acid sequences of the 393-bp fragment of RNA-dependent RNA polymerase of coronaviruses identified 
from bats in the present study. The tree was constructed by the neighbor-joining method using the Jukes-Cantor correction and bootstrap values 
calculated from 1,000 trees. The scale bar indicates the estimated number of substitutions per 50 amino acids. Coronaviruses identified in the 
present study are shown in boldface. Coronaviruses from bats are shaded in gray. HCoV-229E (NC_002645); PEDV, porcine epidemic diarrhea 
virus (NC 003436); TGEV(NC_002306); FIPV (AY994055); HCoV-NL63 NL63 (NC_005831); bat-CoV HKU2 (DQ249235), HKU4 
(DQ074652), HKU5 (DQ249219), HKU6 (DQ249224), HKU7 (DQ249226), and HKU8 (DQ249228); CoV-HKUl (NC_006577); HCoV-OC43 
(NC_005147); MHV, murine hepatitis virus (NC_006852); BCoV, bovine coronavirus (NC_003045); PHEV, porcine hemagglutinating encepha¬ 
lomyelitis virus (NC_007732); SDAV; SARS-CoV (human), human SARS coronavirus (NC_004718); SARS-CoV (Civet), civet SARS-like 
coronavirus (AY304488); bat-SARS-CoV HKU3, bat-SARS-like coronavirus HKU3 (DQ022305); IBV, infectious bronchitis virus (NC_001451); 
TCoV, turkey coronavirus (AF124991); IBV-like, IBV isolated from peafowl (AY641576). Other abbreviations are as defined in the text. 


winged bats in Hong Kong (30), those of six alimentary spec¬ 
imens and one respiratory specimen (obtained from one of the 
six bats with positive alimentary specimens) from Chinese 
horseshoe bat (Rhinolophus sinicus) possessed >97% nucleo¬ 
tide identities to another group 1 coronavirus (bat-CoV 
HKU2) that we described recently from Chinese horseshoe 
bats in Hong Kong (30), and that of one sample from a Chi¬ 
nese horseshoe bat ( Rhinolophus sinicus) possessed >98% nu¬ 
cleotide identities to bat-SARS-CoV HKU3 that we described 
recently from Chinese horseshoe bats in Hong Kong (13). The 
sequences of 42 samples from Leschenault’s rousette bats 
(Rousettus lechenaulti) had <70% nucleotide identities to all 


known coronaviruses, suggesting a novel group 2 coronavirus 
(bat-CoV HKU9); that of one sample from a Leschenault’s 
rousette bat ( Rousettus lechenaulti) had <80% nucleotide 
identities to all known coronaviruses, suggesting a novel group 
1 coronavirus (bat-CoV HKU10). 

Viral culture. No cytopathic effect was observed in any of the 
cell lines inoculated with bat specimens positive for bat-CoV 
HKU9 and bat-CoV HKU10. Quantitative RT-PCR using the 
culture supernatants and cell lysates for monitoring the pres¬ 
ence of viral replication also showed negative results. 

Genome organization and coding potential of bat-CoV 
HKU4, bat-CoV HKU5, and bat-CoV HKU9. Since analysis of 
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FIG. 2. Genome organizations of bat-CoV HKU4, bat-CoV HKU5, bat-CoV HKU9, and representative coronaviruses from each group. 
Papain-like proteases (PL1, PL2, and PL) and the nonstructural proteins are represented by white boxes. Hemagglutinin esterase (HE), spike (S), 
envelope (E), membrane (M), and nucleocapsid (N) are represented by gray boxes. 


the 440-bp fragment of the pol gene of bat-CoV HKU9 sug¬ 
gests a distinct subgroup in group 2 coronavirus and our pre¬ 
vious findings suggest that bat-CoV HKU4 and bat-CoV 
HKU5 represent another distinct subgroup of group 2 corona- 
virus, complete genome sequence data of four strains each of 
bat-CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9 were 
obtained by assembly of the sequences of the RT-PCR prod¬ 
ucts from the corresponding individual specimens. 

The sizes of the genomes of bat-CoV HKU4, bat-CoV 
HKU5, and bat-CoV HKU9 are 30,286 to 30,316 bases, 30,482 
to 30,488 bases, and 29,017 to 29,155 bases, respectively, and 
their G+C contents are 38, 43, and 41% (Table 2). Their 
genome organizations are similar to those of other coronavi¬ 
ruses, with the characteristic gene order: 5'-replicase ORFlab, 
spike (S), envelope (E), membrane (M), and nucleocapsid 
(N)-3' (Fig. 2 and Table 3). Both 5' and 3' ends contain short 
untranslated regions. The replicase ORFlab occupies 20.8 to 
21.5 kb of the genomes (Table 3). This ORF encodes a number 
of putative proteins, including nsp3 (which contains the puta¬ 
tive papain-like protease [PL pro ]), nsp5 (putative chymotryp- 
sin-like protease [3CL pro ]), nspl2 (putative RNA-dependent 
RNA polymerase [Pol]), nspl3 (putative helicase), and other 


proteins of unknown functions (Table 4). These proteins are 
produced by proteolytic cleavage of the large replicase 
polyprotein by PL pro and 3CL pro at specific sites (Table 4). 

Bat-CoV HKU4 and bat-CoV HKU5 have the same genome 
structure (Fig. 2). They also possess the same putative tran¬ 
scription regulatory sequence (TRS) motif, 5'-ACGAAC-3', at 
the 3' end of the leader sequence and precede each ORF 
except NS3c and N (Table 3). This TRS has also been shown 
to be the TRS for SARS-CoV (10). No TRS was observed 
upstream of NS3c, whereas the TRS for N is ACGAAU in all 
eight strains of bat-CoV F1KU4 and bat-CoV F1KU5. Similar 
to other group 2b coronaviruses, the genomes of bat-CoV 
HKU4 and bat-CoV HKU5 have putative PL pro , which are 
homologous to PL2 pro of group 1 and group 2a and PL pro of 
group 3 coronaviruses (Fig. 3). In the genomes of bat-CoV 
HKU4 and bat-CoV HKU5, between S and E, four ORFs that 
encode putative nonstructural proteins (NS3a, NS3b, NS3c, 
and NS3d) were observed. A BLAST search revealed no amino 
acid similarities between these four putative nonstructural pro¬ 
teins and other known proteins, and no functional domains 
were identified by PFAM and InterProScan. TMHMM and 
TMpred analyses showed three putative transmembrane do- 
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TABLE 3. Coding potential and putative transcription regulatory sequences of the genomes of bat-CoV HKU4, 

bat-CoV HKU5, and bat-CoV HKU9 


Coronavirus 

ORF 

Start end 
(nucleotide 
position) 

No. of 
nucleotides 

No. of amino 
acids 

Frame 


Putative TRS 

Nucleotide position 
in genome 

TRS sequence 

Bat-CoV HKU4 

la 

267-13550 

13,284 

4,428 

+3 

63 

ACGAAC(198)AUG 


lb 

13550-21625 

8,076 

2,692 

+2 




S 

21570-25628 

4,059 

1,352 

+3 

21519 

ACG AAC(45) AU G 


NS3a 

25655-25930 

276 

91 

+2 

25636 

ACGAAC(13)AUG 


NS3b 

25948-26307 

360 

119 

+ 1 

25940 

ACGAACUUAU G 


NS3c 

26111-26968 

858 

285 

+2 




NS3d 

26984-27667 

684 

227 

+2 

26976 

ACGAACUUAU G 


E 

27737-27985 

249 

82 

+2 

27730 

ACGAACUAUG 


M 

28000-28659 

660 

219 

+ 1 

27985 

ACGAAC(9) AU G 


N 

28697-29968 

1,272 

423 

+2 

28674 

ACG AAU (16) AU G 

Bat-CoV HKU5 

la 

260-13681 

13,422 

4,474 

+2 

61 

ACGAAC(193)AUG 


lb 

13681-21798 

8,118 

2,706 

+ 1 




S 

21725-25798 

4,074 

1,357 

+2 

21674 

ACGAAC(45)AUG 


NS3a 

25761-26126 

366 

121 

+3 

25807 

ACGAACUUAU G 


NS3b 

26139-26498 

360 

119 

+3 

26130 

ACGAACUU CAU G 


NS3c 

26380-27150 

771 

256 

+ 1 




NS3d 

27160-27831 

672 

223 

+ 1 

27152 

ACGAACUUAU G 


E 

27909-28157 

249 

82 

+3 

27902 

ACGAACUAUG 


M 

28172-28834 

663 

220 

+2 

28157 

ACGAAC(9) AU G 


N 

28884-30167 

1,284 

427 

+3 

28861 

ACGAAU(16)AUG 

Bat-CoV HKU9 

la 

229-12951 

12,723 

4,241 

+ 1 

71 

ACGAAC(152)AUG 


lb 

12951-21020 

8,070 

2,690 

+3 




S 

20974-24798 

3,825 

1,274 

+ 1 

20926 

ACGAAC(42)AUG 


NS3 

24795-25457 

663 

220 

+3 

24786 

ACGAACAGUAU G 


E 

25457-25696 

240 

79 

+2 

25448 

UCGAACUAUAAUG 


M 

25689-26357 

669 

222 

+3 

25662 

ACGAAC(21)AUG 


N 

26419-27825 

1,407 

468 

+ 1 

26408 

ACGAACCUAUUAUG 


NS7a 

27869-28426 

558 

185 

+2 

27863 

ACGAACAUG 


NS7b 

28433-28882 

450 

149 

+2 

28427 

ACGAACAUG 


TABLE 4. Characteristics of putative nonstructural proteins of 
replicase in bat-CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9 


nsp 

Putative function 

Amino acids (first residue posltlon -last 
residue position ) 

or domain" 

Bat-CoV 

HKU4 

Bat-CoV 

HKU5 

Bat-CoV 

HKU9 

nspl 

Unknown 

M 1 -G 195 

M 1 -G 195 

M^G 175 

nsp2 

Unknown 

j^196_q847 

D 196_ G 851 

d 176 -g 772 

nsp3 

Putative PL pro 
domain 

M 848_ G 2784 

A 8 52_q2829 

G 773 -g 2609 

nsp4 

Hydrophobic domain 

q2785_q3291 

q2830_q3337 

q2610_q3103 

nsp5 

3CL pro 

<v3292_q3597 

£|3338_q3643 

^3104_q3409 

nsp6 

Hydrophobic domain 

§3598_q3889 

g3644_Q3935 

q3410_q3699 

nsp7 

Unknown 

^3890_q3972 

qj3936_q4018 

^3700_q3782 

nsp8 

Unknown 

^3973_q4171 

^4019_q4217 

^3783_q3982 

nsp9 

Unknown 

j^4172_q4281 

^4218_q4327 

j s j3983_jj4094 

nsplO 

Unknown 

^4282_q4420 

^4328_q4466 

^4095_q4233 

nspll 

Unknown (short 
peptide at the end 
of ORFla) 

g4421_y4434 

g4467_j^4480 

^4234_jg4248 

nsp 12 

Pol 

g4421_Q5354 

S 4467 -Q 5400 

^4234_q5165 

nsp 13 

Hel 

^5355_q5952 

^5401_q5998 

^5166_q5766 

nspl4 

ExoN 

<^5953_q6475 

g5999_Q6522 

£5767_q6296 

nsp 15 

XendoU 

G 6476 -Q 6817 

q6523_q6871 

^6297_q6633 

nsp 16 

2'O-MT 

^6818_|J7119 

^6872_p7179 

^6634_y6930 


a p[ P rl> p a pain-like protease; 3CL pro , chymotrypsin-like protease; Pol, RNA- 
dependent RNA polymerase; Hel, helicase; ExoN, 3'-to-5' exonuclease; XendoU, 
poly(U)-specific endoribonuclease and 2'-0-MT, ,S'-adenosylmethionine-depen¬ 
dent 2'-0-ribose methyltransferase. 


mains in NS3d of bat-CoV HKU4 (residues 37 to 59, 71 to 90, 
and 94 to 111) and bat-CoV HKU5 (residues 32 to 54, 67 to 84, 
and 89 to 108). Similar to group 2a and 2b coronaviruses, 18 to 
81 and 19 to 82 nucleotides downstream of the N genes (nu¬ 
cleotide positions 29986 to 30049 in bat-CoV HKU4 and nu¬ 
cleotide positions 30186 to 30249 in bat-CoV HKU5), the 3' 
untranslated regions of the two genomes contain predicted 
bulged stem-loop structures (Fig. 4). Downstream of the 
bulged stem-loop structures, 77 to 126 and 78 to 129 nucleo¬ 
tides downstream of the N genes (nucleotide positions 30045 
to 30094 in bat-CoV F1KU4 and nucleotide positions 30245 to 
30296 in bat-CoV F1KU5), pseudoknot structures are present 
(Fig. 4). 

For the genome of bat-CoV F1KU9, similar to bat-CoV 
HKU4, bat-CoV F1KU5, and the group 2b coronaviruses, the 
putative TRS motif, 5'-ACGAAC-3', is also observed. This 
putative TRS is present at the 3' end of the leader sequence 
and precedes each ORF except E, of which the putative TRS 
is UCGAAC (Table 3). Interestingly, the PI position of the 
putative cleavage site by 3CL pro at the junction between nsp9 
and nsplO is occupied by histidine instead of glutamine. This 
exception was also previously observed at the junction between 
the helicase and nspl4 in CoV-FIKUl and FICoV-NL63, where 
the PI positions are also occupied by histidine instead of glu¬ 
tamine (26, 28). One ORF, which encodes a putative nonstruc¬ 
tural protein (NS3), is observed between the S and E genes. 
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HCoV-229E 
TGEV 

HCOV-0C43 
MHV 

SARS-CoV 
BtCoV/133^05 
Bat-CoV HKU4 El 
Bat-CoV HKU5 D 
Bat-CoV HKU9 k| 
IBV 


1FT 



tDKDEg 

5VIKDER 

Sccffivs 


3CD(£I 


-N 
-S 

'g-PTHLDBSAH 
S-AT YkSvB 
AT^YKSVW 

s g:::YK?sra 

TaacDgHSvi 


|GA\4»DLNTSE--LLTKAIDVIfflVE 

INQI^AFDtflQ--KVIKAIDUffljQA 


IKHKCDINYI 

IKVRCgAIHf 

IkikShvnhe 

ijVKMC 
SvKMC 
iEKME 
1PTDE 


SKVE |QFlfiNIigSEDLKAjgSSE 
jKVF aQYSGIgAADLA^SD? 
JKTE aVLP-SDDTLRSEAF< 
JESI aVAgNI^ESEEVJJjE? 
sesi jvaEnl3esekv\55eE 
?eavmla!ni Ieaekawsex 
sth i^kq Bn e^s ae vtthSeE 


SFgQVYAKNKIV j|ADDVE- 


SKEILYVPTTiraSILE 


1663 

1550 

1634 

1679 

1613 

1603 

1596 

1629 

1495 

1236 


HCoV-2 2 9E 
TGEV 

HCOV-0C43 

MHV 

SARS-CoV 
BtCoV/133^05 
Bat-CoV HKU4 
Bat-CoV HKU5 
Bat-CoV HKU9 
IBV 
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taBaaehh 
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fflDHSAFl 
jSASSHDA'S 

vyn^lvnceJ! 
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YS JQPLVQ( 
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SDU-Dffl 
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1732 

1619 

1702 

1746 

1682 

1673 

1666 

1699 

1564 

1304 


HCoV-2 2 9E 
TGEV 

HC0V-0C43 

MHV 

SARS-CoV 
BtCoV/13y05 
Bat-CoV HKU4 
Bat-CoV HKU5 
Bat-CoV HKU9 
IBV 


Evei 

KTQ' 
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|kBgfke| 
IecSsfkfneL 

YSNKT\®EL 
YgDCTFDfJgD 
YSDCTFDFSD 
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TKgSKYLANEAQVdjEHYSS- 
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FSQVdB}G--A||CDFEIi 
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«SQEQF 
SQEQRE 
STTTI 
jSDIE'l 
□SDIE^i 
KQDTT1 
KSQMVE 

ns 


SYELF 


DgV®HFgT 

dwvShfet 
: e®v|yf 

R0COY? 
Rgc®Y7 
|KgCQYV„ 
!TDgCTFY@S 
E2C0QPVR 


1799 
1682 
1769 
1813 
1750 
1742 
1735 
17 68 
1631 
1372 


HCOV-229E 

TGEV 

HCOV-OC43 

MHV 

SARS-CoV 
BtCoV/133^05 
Bat-CoV HKU4 
Bat-CoV HKU5 
Bat-CoV HKU9 
IBV 


CASVKRDGVQVGl 
AAPLiflHGTD-ETl 
I@REC r ;EIGYTVL 
LDKSC JVKGYNI^J 

* DN JKTGVSI1 
EEgQSVFNEI 
EEfQSVFNEI 
NBLDEIHATHEEC 

vvldi3yapvsv\ 

A0NLLHFKTQYSl| 


-HBIKYYSFajRSVRGRA. 
-HgVS VNVEfflTgl KGTV. 
- BgXKLIHC -RFD " 

-«dklvhct-Bfn 
-MrdatqyBvIqes 

-MSVKHR 
-»SVKHR 

-Hdvrkr 

IQ-MRPAIRlffiS 

If i Sanntdebi 




JVE-gLEPCAQSRLgGV/ 

TgLIG-PI IGEVLEAT(| 

ICgNTP-ASVKLPKG-VGgANl 

IcgNTP-[§GKKLPDD-VVAAEfl 

SAPP-AEYKLQQGTFLCANj 

iGLN-gVKVST-SgDPg 

aVKVST-S0DP|! 

*AKVMTPTSQSAGP| 
|CTP—TgVPLDTSGIWfflAir 
|L-FATDGPATVDCDEDAvR-Vj 


SGLN-- 

GLN-- 


jTAFSG-PVD 18 60 

IlCYSG-SNR 1738 

IGD-K 1827 

iTGG-S 1871 

'GN-YQ 1810 

(RAFNVFQGIETS 18 04 
|RAFNVFQG\ETS 17 97 
'AFNVFQGtETS 18 34 

IRGP-VT 1692 

IVGSTN- 1436 


HCoV-229E 

TGEV 

HCoV-OC43 

MHV 

SARS-CoV 
BtCoV/13^05 
Bat-CoV HKU4 
Bat-CoV HKU5 
Bat-CoV HKU9 
IBV 


jTVYDTAK--KSF| 
aTYYDF0N—GLV 1 
SvjjnKCEQ-SYQli 

St^kcITip-kyqii 

St^taSe-tlyr 

Sv^rvSdglfyeS 

SvRfflRVSDGLFYlfl 

!lbBrvsdnllyi« 

Smyavngt-lisw 

gCYTQA-AGQAl 


GDRF\®HDLSLLS-j 

AEKAYHFNRDLL©! 

DVTGK-j 
jSgVSEAKGN-F 
|T?M5EYKGP-j 
TgTBDMKCK-l 

_Jtots DMKCK-i 

SGSjfSgTS DMKCK-j 
ANTRF«T§DLKLP-j 
NLAK of.KFGKKSPYl 


SflVMVG 

Itaiasn 




1897 

1775 

1863 

1907 

1874 

1843 

1836 

1873 

1729 

1469 


FIG. 3. Multiple alignments of PL pro of SARS-CoV, btCoV/133/05 (NC_008315), bat-CoV HKU4, bat-CoV HKU5, bat-CoV HKU9, and IBV 
and PL2 pro of HCoV-229E, TGEV, HCoV-OC43, and MHV. Amino acids conserved across all coronaviruses are highlighted in black. Amino acids 
conserved in 60 to 90% of the coronaviruses are highlighted in gray. The conserved Cys and His amino acid residues of the catalytic dyad are 
marked with an asterisk, the conserved postulated metal-chelating Cys and His residues are marked with a “#” symbol, and the conserved aromatic 
amino acid immediately downstream of the catalytic Cys is marked with a “ + ” symbol. 


Notably, at the 3' end of the genome, it contains the longest 
stretch of nucleotides (1,289 bases) after the N gene among all 
known coronaviruses with complete genomes available, where 
two ORFs that encode putative nonstructural proteins (NS7a 
and NS7b) are observed. A BLAST search revealed no amino 


acid similarities between these three putative nonstructural 
proteins and other known proteins,, and no functional domain 
was identified by PFAM and InterProScan. TMHMM and 
TMpred analysis showed three putative transmembrane do¬ 
mains in NS3 (residues 30 to 47, 54 to 76, and 80 to 99). No 
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Bat-CoV HKU4 


G 

A 

C 


T 

T 

G 

C 

G 

A 

T 

C 

G 

T 

C 


A - 
A - 
C - 
G - 



G — 
A — 
G- 


A 

A 

C - 

[tga| atgccaatgaagagtaa 
S top codon 
of N 


c 

T 

T 

T 

A 

c 

G 

T 

G 

A 

T 

G 

C 

A 

T 

T 



CACAGAATGGAATCATGTTA AACCCAT 


Bat-CoV HKU5 


A 

A 

A - 
G - 
G - 
T - 


C 


T - 
T - 
G - 
T - 
G - 
A 
T 
C 



C 

T 

G - 
A - 
G — 
A — 
A — 

,_, C — 

|taa]attgccaattgaacgtaa 


Stop codon 
of N 



FIG. 4. Predicted bulged stem-loop and pseudoknot structures downstream of N in genomes of bat-CoV HKU4 and bat-CoV HKU5. Stop 
codons for the N genes are boxed. Broken lines indicate alternative base pairing. 


bulged stem-loop and pseudoknot structures, similar to those 
in other group 2 coronaviruses, are observed downstream to N, 
NS7a, or NS7b in the bat-CoV HKU9 genomes. 

Phylogenetic analyses. The phylogenetic trees constructed 
using the amino acid sequences of the 3CL pro , Pol, helicase, 
S, and N of bat-CoV HKU4, bat-CoV HKU5, bat-CoV 
HKU9, and other coronaviruses are shown in Fig. 5, and the 
corresponding pairwise amino acid identities are shown in 
Table 2. For all of the five genes, bat-CoV FIKU4, bat-CoV 
FIKU5, and bat-CoV FIKU9 possess higher amino acid iden¬ 
tities to the homologous genes in other group 2 coronavi¬ 
ruses than to those of group 1 and group 3 coronaviruses 
(Table 2). In all five trees, all strains of bat-CoV HKU4, 
bat-CoV FIKU5, and another strain of coronavirus recently 
described (24) were clustered together, with bootstrap val¬ 
ues of 1,000 in all cases, forming a distinct subgroup (Fig. 5). 
Within this subgroup, all four strains of bat-CoV HKU4 
were clustered with the strain of coronavirus recently de¬ 
scribed (BtCoV/133/05) (24), and all four strains of bat-CoV 
HKU5 were clustered separately, forming two distinct sub¬ 
lineages. Furthermore, in all five trees, all strains of bat- 
CoV HKU9 were clustered together, with bootstrap values 
of 1,000 in all cases, forming another distinct subgroup (Fig. 
5). From both phylogenetic tree analysis and amino acid 
differences, the strains of bat-CoV HKU9 subgroup were 
more closely related to the group 2b coronaviruses than the 
others (Fig. 5 and Table 2). We propose two novel sub¬ 
groups, group 2c and group 2d, of coronavirus to describe 
these two distinct subgroups, respectively. 

Estimation of synonymous and nonsynonymous substitution 
rates. The Ka/Ks ratio for the various coding regions in bat- 
CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9 is shown in 
Table 5. For bat-CoV F1KU4, the numbers of synonymous and 
nonsynonymous mutations were small. Therefore, the Ka/Ks 


ratios of the various coding regions, as, for example, the ex¬ 
ceptional high Ka/Ks ratios of nsp6, NS3c and N, were not 
conclusive. For bat-CoV FIKU5, the Ka/Ks ratios of the vari¬ 
ous coding regions were small, implying that the genes were 
stably evolving. Notably, the Ka/Ks ratio for NS3c of bat-CoV 
HKU5 is 0.027, which suggested that this gene is expressed and 
stably evolving. However, NS3c possesses neither TRS nor 
internal ribosomal entry site (IRES). Further experiments are 
necessary to elucidate whether NS3c is expressed and, if it is 
expressed, what signal sequence is involved for ribosomal rec¬ 
ognition. For bat-CoV HKU9, the mean Ka/Ks ratio of NS7a 
and 7b (0.961 and 0.529) was significantly higher than those of 
other coding regions, implying that these two genes are rapidly 
evolving. 

DISCUSSION 

Two putative new subgroups, 2c and 2d, of coronaviruses, 
are described. The four strains of bat-CoV HKU4 and the four 
strains of bat-CoV HKU5 formed two distinct branches in the 
putative subgroup 2c lineage in all five phylogenetic trees an¬ 
alyzed (Fig. 5). Moreover, all strains of bat-CoV HKU4 were 
found in lesser bamboo bats, whereas all strains of bat-CoV 
HKU5 were found in Japanese pipistrelle (30). These findings 
support the view that bat-CoV HKU4 and bat-CoV HKU5 are 
two separate coronavirus species. Since bat-CoV HKU4 and 
bat-CoV HKU5 have the same genome organization and share 
the same TRS, we speculate that these two coronaviruses orig¬ 
inated from the same ancestor, and their subsequent diver¬ 
gence into two separate species was due to the adaptation to 
different hosts and ecological niches. As for bat-CoV HKU9, 
the S and N genes showed quite marked nucleotide polymor¬ 
phism and amino acid sequence changes, but the amino acid 
sequences of 3CL pro , Pol, and helicase are relatively conserved 
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FIG. 5. Phylogenetic analysis of chymotrypsin-like protease (3CL pro ), RNA-dependent RNA polymerase (Pol), helicase (Hel), spike (S), and 
nucleocapsid (N) of bat-CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9. The trees were constructed by the neighbor-joining method using the 
Jukes-Cantor correction and bootstrap values calculated from 1,000 trees. We included 327, 949, 609,1,661, and 582 amino acid positions in 3CL pro , 
Pol, helicase, S and N, respectively, in the analysis. The scale bar indicates the estimated number of substitutions per 10 amino acids. Abbreviations 
are as defined in the text or in the legend to Fig. 1. 


(Fig. 5). Furthermore, all 42 strains of bat-CoV HKU9 were 
found in the same bat species, Leschenault’s rousette. These 
findings support the view that all of the 42 strains of bat-CoV 
FIKU9 belong to one coronavirus species. Complete genome 
sequencing of more bat-CoV HKU9 strains may show geno¬ 
types and even recombination events as in the case of CoV- 
FIKU1 (33). Based on phylogenetic tree analysis, although 
coronaviruses of groups 2c (bat-CoV F1KU4 and bat-CoV 
FIKU5) and group 2d (bat-CoV FIKU9) are more closely re¬ 
lated to the other group 2 coronaviruses, they formed branches 
distinct from the group 2a and 2b coronaviruses. Furthermore, 
bat-CoV HKU4, bat-CoV HKU5, and bat-CoV HKU9 of 
these two new proposed subgroups possessed additional 
genomic features different from those of other group 2 coro¬ 
naviruses (Table 6). For the coding potentials of the genomes, 
group 2a coronaviruses possess PLl pro and PL2 pro , but group 
2b, 2c, and 2d coronaviruses only possess one PL pro that is 
homologous to PL2 pro . It is noteworthy that in an article re¬ 


cently published, the authors mentioned that no PL pro 
was identified in nsp3 of the genome of BtCoV/133/05 
(NC_008315, >95% overall nucleotide identities with bat-CoV 
HKU4) (24). However, after careful analysis of their nsp3 by 
multiple alignment and a search of the conserved domains and 
amino acid residues (37), it was found that PL pro is present in 
the genome of BtCoV/133/05, with the conserved Cys and His 
residues of the catalytic dyad, conserved aromatic amino acid 
residue (Trp, Phe, or Tyr) immediately downstream to the 
catalytic Cys, and the postulated metal-chelating Cys and His 
residues of the zinc fingers (Fig. 3). The genomes of group 2a 
coronavirus, but not those of group 2b, 2c, and 2d coronavi¬ 
ruses, encode hemagglutinin esterase. The genomes of group 
2b coronavirus, but not those of group 2a, 2c, and 2d corona¬ 
viruses, contain several small ORFs between the M and N 
genes. The genomes of group 2d coronavirus, but not those of 
group 2a, 2b, and 2c coronaviruses, contain two ORFs down¬ 
stream of the N gene. As for the TRS, the sequence for the 
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TABLE 5. Estimation of nonsynonymous substitution and 
synonymous rates in the genomes of bat-CoV HKU4, 
bat-CoV HKU5, and bat-CoV HKU9 


Coding 



Ka/Ks ratio 






region 

Bat-CoV HKU4 

Bat-CoV HKU5 

Bat-CoV 

HKU9“ 

nspl 

0.031 


Ka = 0, Ks = 0.03711 

0.247 

nsp2 

0.133 


0.061 

0.131 

nsp3 

0.154 


0.070 

0.091 

nsp4 

0.155 


0.045 

0.066 

nsp5 

Ka = 

0, Ks = 0.00239 

0.016 

0.035 

nsp6 

0.317 


0.076 

0.067 

nsp7 

Ka = 

0, Ks = 0.00904 

0.066 

0.020 

nsp8 

Ka = 

0, Ks = 0 

0.011 

0.025 

nsp9 

Ka = 

0, Ks = 0.00691 

0.021 

0.019 

nsplO 

Ka = 

0, Ks = 0 

0.050 

0.021 

nspll 

Ka = 

0, Ks = 0 

Ka = 0, Ks = 0 

0.283 

nspl 2 

Ka = 

0, Ks = 0.00163 

0.003 

0.027 

nspl 3 

Ka = 

0, Ks = 0 

0.009 

0.011 

nspl 4 

Ka = 

0, Ks = 0 

0.007 

0.028 

nspl 5 

Ka = 

0, Ks = 0.00665 

0.091 

0.044 

nspl 6 

Ka = 

0, Ks = 0 

0.018 

0.081 

s 

0.010 


0.127 

0.170 

NS3 

NS3a 

0.187 


Ka = 0.00181, Ks = 0 

0.234 

NS3b 

0.308 


0.201 


NS3c 

1.205 


0.027 


NS3d 

Ka = 

0.00096, Ks = 0 

0.166 


E 

Ka = 

0, Ks = 0.00865 

Ka = 0, Ks = 0.03392 

0.108 

M 

Ka = 

0, Ks = 0.00325 

0.014 

0.097 

N 

0.473 


0.060 

0.096 

NS7a 




0.961 

NS7b 




0.529 


a Mean of six comparisons. 


TRS of group 2a coronaviruses is CUAAAC and that of the 
group 2b, 2c, and 2d coronaviruses is ACGAAC (10, 12, 16). 
For the E gene, TRS is present in group 2b, 2c, and 2d, but not 
2a, coronaviruses, which use IRES for their translation. The 
genomes of group 2a, 2b, and 2c coronaviruses, but not of 
group 2d coronaviruses, contain bulged stem-loop and 
pseudoknot structures downstream of the N gene. 

Coronaviruses are probably better classified into group 1 
(subgroups la and lb), group 2 (subgroups 2a, 2b, 2c, and 2d), 
and group 3 than into seven groups. Traditionally, coronavi¬ 
ruses have been classified into groups 1, 2, and 3. When SARS- 
CoV was first identified and its genome was sequenced, it was 
proposed that it constituted a fourth group of coronavirus (17, 
21). However, after more extensive phylogenetic analyses, it 
was suggested that SARS-CoV probably represents a distant 
relative of group 2 coronaviruses, and it was subsequently 
classified as group 2b coronaviruses (4, 22). In 2005, we and 
another group in mainland China independently described ad¬ 
ditional members of group 2b coronaviruses (13,15). Recently, 
we described the discovery of six novel coronaviruses from bats 
in Hong Kong (30). Phylogenetic analysis of the pol and heli- 
case genes showed that two of them, bat-CoV HKU4 and 
bat-CoV HKU5, probably represent a novel subgroup in group 
2 coronaviruses. Subsequently, another group reported similar 
diversity in coronaviruses found from bats in mainland China, 
and they proposed that coronaviruses should be classified into 
five groups, instead of groups 1, 2a, 2b, 2c, and 3 (24). In the 
present study, we discovered another distinct subgroup of 
coronaviruses (bat-CoV HKU9). We also performed complete 
genome sequencing of four strains each of bat-CoV HKU4, 
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TABLE 6. Comparison of characteristics in the genomes of group 
2a, 2b, 2c, and 2d coronaviruses 


Characteristics 0 


Group 2 coronavirus 


2a 

2b 

2c 

2d 

Coding potential 





Papain-like protease 

PLl pro and 
PL2 pro 

PL pro 

PL pro 

PL pro 

Hemagglutinin esterase 

+ 

- 

- 

- 

Small ORFs between M 
and N 


+ 



NS7a and 7b 
downstream to N 

TRS 




+ 

TRS sequence 

CUAAAC 

ACGAAC 

ACGAAC 

ACGAAC 

TRS/IRES for E 

IRES 

TRS 

TRS 

TRS 

Stem-loop and 

pseudoknot structures 
downstream to N 

+ 

+ 

+ 

— 


a TRS, transcription regulatory sequence; IRES, internal ribosome entry site. 


bat-CoV HKU5, and bat-CoV HKU9. This large amount of 
genome sequence data enabled us to perform a thorough com¬ 
parative analysis of the genomes of the various groups of coro¬ 
naviruses. The results showed that the amino acid identities in 
the various ORFs among the group 2 coronaviruses were sig¬ 
nificantly higher than those between group 2 coronaviruses and 
the group 1 and 3 coronaviruses. Phylogenetic trees con¬ 
structed using 3CL pro , Pol, helicase, S, and N all showed that 
the group 2a, 2b, 2c, and 2d coronaviruses are more closely 
related to each other than the group 1 and 3 coronaviruses 
(Fig. 5). These showed that the group 2 coronaviruses probably 
originated from one common ancestor before they diverge into 
the four subgroups, and therefore it would be more logical and 
informative if they are classified as subgroups of group 2 coro¬ 
naviruses. 

This is the first time that NS7a and 7b downstream of the N 
gene has been observed in group 2 coronaviruses. Previously, 
feline infectious peritonitis virus (FIPV), a group 1 coronavi¬ 
rus, is the only coronavirus known to possess two genes down¬ 
stream of the N gene (18). FIPV infects macrophages in a 
variety of tissues systemically, whereas feline enteric coronavi¬ 
rus (FECV), a coronavirus closely related to FIPV, is restricted 
to replication in enterocytes. It has been found that the FECV 
genome lacks the 300 nucleotides at the 3' end of FIPV, sug¬ 
gesting that this region may be important for virulence. Re¬ 
cently, it has been shown that an isogenic deletion mutant of 
FIPV missing the 7ab cluster protected cats against lethal 
challenge by FIPV, which makes the mutant a potential live 
attenuated vaccine candidate (7). In addition to FIPV, the 
genome of porcine transmissible gastroenteritis virus (TGEV) 
also possesses one gene downstream of N (25). This gene 
encodes a hydrophobic protein that associates with endoplas¬ 
mic reticulum and cell surface membranes in TGEV-infected 
cells, suggesting that it may have a role in the membrane 
association of replication complexes or assembly of the virus 
(25). In the present comparative genomic analysis, ORFs 
downstream of the N gene were not found in any other coro¬ 
naviruses other than group la coronaviruses and bat-CoV 
HKU9 (Fig. 2). While the presence of TRS supports that NS7a 
and 7b of bat-CoV HKU9 are probably expressed, the high 
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Ka/Ks ratio implies that these two genes are under high selec¬ 
tive pressure and thus are rapidly evolving, which may be due 
to recent acquisition by recombination. Further experiments 
will delineate the function and essentiality of NS7a and NS7b 
in bat-CoV HKU9. 

The huge diversity of coronaviruses is probably a result of 
both a higher mutation rate of RNA viruses due to the infi¬ 
delity of their polymerases and a higher chance of recombina¬ 
tion as a result of their unique replication mechanism. Before 
the SARS epidemic in 2003, a total of 19 (2 human, 13 mam¬ 
malian, and 4 avian) coronaviruses were known. Since the 
SARS epidemic, two novel human coronaviruses have been 
discovered (5, 27, 29). In the past two years, at least 10 previ¬ 
ously unrecognized coronaviruses from bats have been de¬ 
scribed in Hong Kong and mainland China (13, 15, 20, 24, 30). 
In addition to the generation of a large number of coronavirus 
species, recombination has also resulted in the generation of 
different genotypes in a particular coronavirus species. This is 
exemplified by the presence of at least three genotypes in 
CoV-HKUl as a result of recombination (33). The astonishing 
diversity of coronaviruses in bats implies that there are prob¬ 
ably a lot of other unknown coronaviruses in other animal 
species. Further molecular epidemiological studies in bats of 
other countries, as well as in other animals, and complete 
genome sequencing will shed more light on coronavirus diver¬ 
sity and the evolutionary histories of these viruses. 
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